The code is quite old and I do not remember all the context. I guess you can just split your file into multiple single-page files and apply this function.
This web application is designed to parser the pdf file(Employess Payslip) and extract the details from pdf file using pdfbox api and converted to json and ...
pdf2xml - extract text from PDF files and wraps it in XML. =head1 USAGE. pdf2xml [OPTIONS] pdf-file > output.xml. =head2 OPTIONS. -c ............. split strings ...
This project started as an alternative to poppler's pdftoxml, which didn't properly decode CID Type2 fonts in PDFs. This script requires pdfminer. License.
pdfalto is a command line executable for parsing PDF files and producing structured XML representations of the PDF content in ALTO format, capturing in ...